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Abstract 


This  paper  presents  an  application  of  vision-based  object  recognition  to  create  a 
leader/follower  behaviour  in  mobile  robots.  A  system  is  developed  which  makes  use  of  the 
Scale  Invariant  Feature  Transform  (SIFT)  agorithm  to  recognize  a  leader  robot  or  human.  The 
follower  robot  then  uses  PID  control  to  track  the  leader’s  movements  while  maintaining  a 
fixed  following  distance. 


Resume 


Cet  article  presente  une  application  de  la  technique  de  reconnaissance  visuelle  d’objets  visant 
a  creer  un  comportement  de  chef  et  d’executant  chez  des  robots  mobiles.  Un  systeme  en  voie 
de  mise  au  point  utilise  l’algorithme  SIFT  d’invariance  d’echelle  pour  reconnaitre  un  chef 
robotise  ou  humain.  Le  robot  executant  utilise  alors  un  regulateur  PID  pour  suivre  les 
mouvements  du  chef  tout  en  maintenant  une  distance  fixe  de  poursuite. 
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Executive  summary 


Leader/Follower  Behaviour  Using  the  SIFT  Algorithm  for 
Object  Recognition 

J.  Giesbrecht;  DRDC  Suffield  TM  2006-108;  Defence  R&D  Canada  -  Suffield;  June  2006. 

Background:  Human/robot  or  multi-robot  teams  have  a  wide  variety  of  potential 
applications.  The  ability  of  an  autonomous  mobile  robot  to  control  its  position  relative  to  team 
members  is  a  key  enabling  component.  This  work  focuses  on  a  small,  simple  autonomous 
robot  following  a  lead  human  or  robot  using  only  an  inexpensive  video  camera,  maintaining 
its  position  relative  to  the  leader  by  recognizing  an  object  which  it  has  been  trained  to  follow. 
This  process  requires  two  components  in  the  follower  robot:  a  visual  recognition  system 
which  can  provide  information  about  the  relative  pose  of  the  leader,  and  a  control  system 
which  uses  this  information  to  adjust  the  follower  robot’s  speed  and  direction. 

Principal  Results:  The  follower  system  relies  upon  object  recognition  using  the  Scale 
Invariant  Feature  Transform  (SIFT)  algorithm,  built  into  the  ViPR  software  libraries  in  the 
Evolution  Robotics  ERSP  toolkit.  This  technique  extracts  feature  points  from  a  training  image 
and  compares  these  feature  points  to  those  extracted  from  successive  camera  images  to 
recognize  the  leader’s  position.  Using  positional  information  from  the  SIFT  object  recognition 
code,  the  tracking  controller  used  simple  PID  loops  on  the  robot’s  translational  and  rotational 
velocity  to  follow  the  leader  and  maintain  a  safe  following  distance. 

This  system  was  tested  in  an  indoor  office  environment,  and  was  able  to  follow  arbitrary 
leader  objects  at  moderate  walking  speeds.  However,  due  to  the  direct  pursuit  nature  of  the 
controller,  the  robot  would  bump  intervening  obstacles  if  the  distance  between  the  leader  and 
follower  was  too  great. 

Significance  of  Results:  These  experiments  proved  the  SIFT  algorithm  as  a  viable  method  of 
creating  leader/follower  behaviour,  and  can  serve  as  a  proof  of  concept  for  more  complex 
convoying  operations  using  machine-vision  based  leader  detection. 


Future  Work:  Given  the  current  simplicity  of  this  system,  a  large  number  of  improvements 
and  extensions  are  potentially  available.  Object  recognition  could  be  improved  through  the 
use  of  higher  resolution,  or  zoom  cameras.  In  order  to  control  path  following  performance 
beyond  that  enabled  by  direct  PID  pursuit,  global  localization  (such  as  GPS)  would  be 
required.  With  this,  a  pan/tilt  unit  could  be  added  to  the  system  to  allow  the  vision  system  to 
track  the  leader’s  position  while  the  robot  followed  it  using  the  Pure  Pursuit  path  tracking 
algorithm.  The  potential  applications  of  this  type  of  system  need  to  be  explored,  such  as  a 
military  convoying  system.  This  will  require  adaptation  to  an  Ackerman  steered  vehicle,  and 
testing  of  the  algorithm  in  less  controlled,  outdoor  lighting  conditions.  Further  system 
development  would  also  see  the  introduction  of  a  variety  of  robot  behaviours  through  the 
recognition  of  a  variety  of  fiducial  objects,  leading  to  much  easier  cooperation  between 
human  and  machine. 


DRDC  Suffield  TM  2006-108 


This  page  intentionally  left  blank. 


IV 


DRDC  Suffield  TM  2006-108 


Sommaire 


Leader/Follower  Behaviour  Using  the  SIFT  Algorithm  for 
Object  Recognition 

J.  Giesbrecht;  DRDC  Suffield  TM  2006-108;  R  &  D  pour  la  defense  Canada  -  Suffield; 
juin  2006. 


Contexte  :  Des  equipes  composees  d’une  combinaison  d’humains  et  de  robots  ou  bien 
multirobot  presentent  des  possibilites  d’applications  tres  variees.  La  capacite  d’un  robot  mobile 
autonome  a  controler  sa  position  par  rapport  a  d’autres  membres  de  l’equipe  est  une 
composante  cle  de  fonctionnement.  Ce  travail  est  axe  sur  un  petit  robot  autonome  simple 
poursuivant  un  chef  humain  ou  robotise  ;  il  utilise  simplement  une  camera  video  peu  couteuse 
pour  maintenir  sa  position  par  rapport  au  chef  en  reconnaissant  un  objet  qu’il  a  ete  entraine  a 
poursuivre.  Ce  procede  requiert  deux  composantes  chez  le  robot  executant :  un  systeme  de 
reconnaissance  visuelle  qui  peut  foumir  des  informations  au  sujet  de  sa  position  en  fonction  de 
celle  du  chef  et  un  systeme  de  controle  qui  ajuste  la  vitesse  et  la  direction  du  robot  executant. 

Resultats  principaux  :  Le  systeme  de  l’executant  depend  de  la  reconnaissance  de  l’objet,  une 
technique  basee  sur  Lalgorithme  SIFT  realise  dans  les  bibliotheques  de  logiciels  ViPR  de  la 
trousse  Evolution  Robotics  ERSP.  Cette  technique  extrait  des  points  selectionnes  a  partir  d’une 
image  d’entrainement  et  compare  ces  points  selectionnes  a  ceux  extraits  d’une  succession 
d’images  prises  par  une  camera  reconnaissant  la  position  du  chef.  En  utilisant  l’information 
positionnelle  obtenue  par  le  code  de  reconnaissance  d’objet  SIFT,  le  controleur  de  la  poursuite 
utilise  de  simples  boucles  PID  sur  la  velocite  translationnelle  et  rotationnelle  du  robot  pour 
poursuivre  le  chef  et  maintenir  une  distance  de  poursuite  securitaire. 

Ce  systeme  a  ete  teste  dans  un  milieu  de  bureaux  a  l’interieur  et  a  reussi  a  poursuivre  des  objets 
chefs  arbitrages  a  des  vitesses  de  marche  moderees.  Le  robot  percutait  cependant  les  obstacles 
qui  interferaient  si  la  distance  entre  le  chef  et  l’executant  etait  trop  importante  ;  ceci  provenait 
de  la  nature  de  la  poursuite  directe  du  controleur. 

La  portee  des  resultats  :  Ces  experiences  ont  prouve  que  Lalgorithme  SIFT  etait  une  methode 
viable  permettant  de  creer  un  comportement  du  chef  et  de  l’executant  et  pouvant  servir  de 
validation  de  principe  pour  des  operations  de  convoiement  plus  complexes  qui  utiliseraient  une 
vision  artificielle  basee  sur  la  detection  d’un  chef. 

Les  travaux  futurs  :  Etant  donne  la  simplicite  actuelle  du  systeme,  il  est  possible  d’y  apporter 
un  grand  nombre  d’ ameliorations  et  d’extensions.  La  reconnaissance  d’objets  pourrait  etre 
amelioree  en  utilisant  une  plus  haute  resolution  ou  un  appareil  photo  a  focale  variable.  Pour 
controler  le  rendement  de  la  poursuite  de  parcours  au-dela  de  la  capacite  d’une  poursuite  directe 
PID,  il  faudrait  utiliser  une  localisation  mondiale  (telle  que  le  GPS).  On  pourrait  y  ajouter  une 
unite  panoramique  basculante  qui  permettrait  au  systeme  de  vision  de  poursuivre  la  position  du 
chef  pendant  que  le  robot  la  poursuivrait  en  utilisant  Lalgorithme  de  poursuite  mobile  pure.  Les 
applications  possibles  de  ce  type  de  systeme,  telles  que  celles  des  systemes  de  convoi 
militaires,  doivent  etre  explorees.  Ceci  demandera  d’adapter  les  vehicules  de  type  de  propulsion 
Ackerman  et  de  tester  Lalgorithme  dans  des  conditions  moins  controlees  de  lumiere  exterieure. 
Une  mise  au  point  plus  approfondie  du  systeme  favoriserait  l’introduction  d’une  variete  de 
comportements  de  robots  au  moyen  de  la  reconnaissance  d’une  variete  d’objets  de  reference  et 
permettrait  une  bien  meilleure  cooperation  entre  L  humain  et  la  machine. 
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1  Introduction 


There  exist  a  wide  variety  of  potential  applications  involving  human/robot  or  multi-robot 
teams.  The  ability  of  an  autonomous  mobile  robot  to  control  its  position  relative  to  team 
members  is  a  key  enabling  component.  This  work  focuses  on  a  small,  simple  autonomous 
robot  following  a  lead  human  or  robot  using  only  an  inexpensive  video  camera,  maintaining 
its  position  relative  to  the  leader  by  recognizing  an  object  which  it  has  been  trained  to 
follow.  This  process  requires  two  components  in  the  follower  robot:  a  visual  recognition 
system  which  can  provide  information  about  the  relative  pose  of  the  leader,  and  a  control 
system  which  uses  this  information  to  adjust  the  follower  robot’s  speed  and  direction. 

2  Background 

2.1  Target  Recognition 

Many  unmanned  systems  have  accomplished  leader/follower  behaviour  by  sending  positional 
information  over  a  wireless  data  link[l].  This  requires  radio  infrastructure  and  position 
finding  equipment  such  as  GPS,  both  of  which  are  prone  to  failure  and  consume  valuable 
bandwidth.  Furthermore,  it  is  advantageous  for  a  human  to  be  able  to  interact  with  a  robot 
without  the  need  for  electronic  equipment.  Therefore,  many  researchers  have  used  other 
detection  systems  to  locate  the  leader,  such  as  sonar [2]  and  laser  range  finders [3]  which  are 
range  limited,  prone  to  noise  interference,  and  again  require  complex  hardware. 

A  suitable  solution  to  the  problem  is  to  use  an  inexpensive  video  camera  and  image  process¬ 
ing  techniques  to  find  the  leader’s  pose.  The  most  basic  approaches  track  a  colour  fiducial 
on  the  leader  in  consecutive  video  images,  using  the  perceived  size  of  the  fiducial  to  esti¬ 
mate  distance[4,  5].  Another  approach  recognizes  the  colour  of  a  human  leader’s  shirt [6]. 
These  methods,  although  effective,  can  easily  fail  if  a  similar  color  object  enters  the  robot’s 
field  of  view  or  under  changing  lighting  conditions.  In  an  extension  to  this  method,  some 
researchers  use  color  hducials  of  a  specific  shape  which  allow  additional  information  to  be 
gathered,  such  as  leader  roll,  pitch  and  yaw [2,  7,  8]  .  However,  in  addition  to  the  previously 
mentioned  limitations,  this  also  makes  the  system  sensitive  to  partial  occlusion  by  obstacles 
between  the  leader  and  follower. 

Recognizing  an  object  on  the  leader  human  or  robot  to  track  it  directly  is  a  more  robust  and 
practical  alternative.  One  approach  used  the  taillights  of  a  lead  vehicle,  which  is  effective  at 
night  [9].  Another  implementation  uses  template  based  image  recognition  to  retrieve  leader 
distance  and  orientation  from  images[10].  This  type  of  approach  provides  insensitivity  to 
target  occlusion  and  changes  in  lighting,  and  does  not  require  the  use  of  special  hducials  on 
the  lead  robot  or  human. 

2.2  Path  Following 

Once  the  position  of  the  leader  has  been  established,  a  control  scheme  must  be  implemented 
to  actuate  the  robot’s  movements.  There  exist  two  basic  options:  attempt  to  match  the 
leader’s  complete  path,  or  simply  follow  the  leader’s  current  position  directly.  To  implement 
the  former,  the  robot  must  be  able  to  keep  a  record  of  both  its  own  and  the  leader’s  position 
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Figure  1:  The  ER1  robot  with  camera  and  laptop. 


very  accurately  in  world  coordinates  [1,  9].  It  then  repeats  the  leader’s  traverse  using  a  path 
tracking  algorithm  such  as  Pure  Pursuit  [11].  Unfortunately,  keeping  track  of  position  in 
world  coordinates  is  not  necessarily  an  easy  problem.  Additionally,  this  approach  makes  it 
more  difficult  for  the  follower  to  keep  its  vision  sensor  aimed  at  the  lead  robot,  unless  the 
robot  is  equipped  with  a  pan/tilt  camera  [4,  7]. 

For  simple  robots,  it  is  more  practical  to  pursue  the  lead  vehicle  directly.  A  number  of  works 
are  available  on  the  topic.  If  the  leader’s  position  and  orientation  are  known,  the  follower 
can  calculate  a  trajectory  to  attain  that  pose  using  the  Vector  Pursuit  method[12],  or  with 
Bezier  Trajectories  [7,  13].  Another  work  uses  a  “virtual  trailer  link”  model  to  recreate 
the  leader’s  motions  when  following  closely [3].  These  approaches  have  the  advantage  of 
following  the  leader’s  motions  more  accurately,  but  will  not  keep  the  vision  sensor  aimed  on 
the  leader  for  tracking  purposes.  If  no  orientation  information  is  available,  a  “tail  chase” 
method  is  often  adopted  whereby  the  leader’s  current  position  or  position/velocity  are 
considered  as  a  target  to  pursue  using  a  kinematic  model  of  the  vehicle[14,  15]. 

An  even  simpler  approach  reduces  the  problem  of  controlling  robot  motion  in  Cartesian 
coordinates  to  a  visual  servoing  problem[2,  5,  16,  17].  The  robot  simply  tries  to  keep  the 
recognized  image  centered  in  its  field  of  view  by  controlling  the  wheels  of  the  robot.  This 
method  removes  all  modeling  in  world  coordinates  and  ensures  that  the  lead  vehicle  stays 
in  the  follower’s  field  of  view.  The  downside  to  this  approach  is  that  the  leader’s  trajectory 
is  not  followed  as  accurately,  and  the  robot  will  cut  corners  dramatically  if  it  falls  a  large 
distance  behind. 

The  novel  approach  taken  in  this  work  combines  the  powerful  object  recognition  techniques 
provided  by  the  SIFT  algorithm  to  follow  an  arbitrary  leader,  with  a  simple  PID  based 
control  scheme  suitable  for  a  small  indoor  robot. 

3  Hardware 


The  robot  used  in  this  application  was  the  ER1  from  Evolution  Robotics,  shown  in  Figure 
1.  It  has  two  independently  driven  wheels  and  power  provided  by  an  on-board  battery  pack. 
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Figure  2:  A  trained  image  and  the  recognized  image  in  a  cluttered  scene,  with  the  red  box 
indicating  the  position  of  the  recognized  object.  Feature  points  are  shown  as  yellow  circles. 


It  is  a  small,  simple  and  low  cost  robot  measuring  approximately  80cm  high  and  40cm  wide. 
The  only  sensor  used  in  this  work  is  an  IRez  Kritter  USB  camera  with  640x480  resolution 
mounted  at  the  top  of  the  robot.  Processing  power  was  provided  by  a  Dell  D600  laptop 
running  Fedora  Core  3  Linux  on  a  Pentium  M  2.0Ghz  CPU.  Two  Lucent  Orinoco  802.11 
PCMCIA  wireless  cards  were  used  for  communication  with  the  robot  laptop  from  a  base 
computer. 

4  Target  Recognition  Using  SIFT 


The  follower  system  relies  upon  object  recognition  using  the  Scale  Invariant  Feature  Trans¬ 
form  (SIFT[18])  algorithm,  built  into  the  ViPR  software  libraries  in  the  Evolution  Robotics 
ERSP  toolkit  [19].  This  technique  extracts  feature  points  from  a  training  image  and  com¬ 
pares  these  feature  points  to  those  extracted  from  successive  camera  images.  For  a  planar 
leader  object,  only  one  training  image  is  necessary.  For  3D  objects,  training  images  from 
different  views  makes  the  algorithm  more  robust. 

The  algorithm  detects  unique  features  in  an  image  of  an  object  by  analyzing  the  texture 
of  a  small  window  of  pixels.  Up  to  1,000  feature  points  are  extracted  from  an  image,  each 
consisting  of  the  feature’s  location  and  a  texture  description.  A  small  portion  of  these 
features,  filtered  for  uniqueness  and  robustness  make  up  a  model  database  for  that  object. 

When  attempting  to  recognize  an  object,  it  extracts  similar  types  of  features  from  a  newly 
acquired  image,  associating  them  with  with  those  in  the  trained  model  database.  It  com¬ 
pares  these  feature  points  using  the  texture  descriptors.  If  there  is  a  high  number  of  feature 
matches  between  the  acquired  image  and  a  trained  model,  a  potential  match  is  supposed. 
At  this  point,  it  attempts  to  match  the  acquired  image  to  the  trained  model  by  applying  an 
affine  transform.  If  the  difference  between  the  transformed  acquired  image  and  the  original 
trained  model  is  low  enough,  it  declares  a  match. 

This  method  has  a  number  of  very  desirable  characteristics  for  real  world  applications.  It 
is  unaffected  by  changes  in  scale,  rotation  and  translation.  It  also  has  some  robustness 
to  changes  in  lighting,  and  can  be  used  on  low  cost,  low  resolution  cameras.  Finally, 
the  algorithm  will  typically  recognize  objects  with  50%  to  90%  occlusion.  It  specializes 
in  planar,  textured  objects,  but  also  works  well  with  3D  objects  having  slightly  curved 


DRDC  Suffield  TM  2006-108 


3 


z 


Figure  4:  Geometry  of  robot  and  target. 


components.  A  model  image  and  the  subsequent  recognized  image  are  shown  in  Figure  2. 

5  Leader  Tracking 


The  object  recognition  stage  outputs  a  positive  or  negative  recognition  for  each  camera 
image.  For  positive  recognition,  it  also  supplies  the  model  name,  the  pixel  location  of  the 
object’s  center  in  y,z  coordinates,  and  a  distance  estimate,  (ty,tz  and  p ). 

This  information  can  be  used  to  find  the  leader’s  relative  position.  The  camera  used  has  a 
field  of  view  of  46  degrees  horizontally  spread  over  640  pixels,  resulting  in  approximately 
0.00125  radians/pixel,  S.  This  constant  is  an  approximation  obtained  from  experiment, 
and  does  not  take  into  account  the  unknown  characteristics  of  the  camera  lens.  The  min¬ 
imal  effects  of  this  simplification  are  reviewed  in  Section  7.1.2.  Figure  3  shows  the  image 
geometry.  If  the  number  of  pixels  to  the  center  of  the  image  are  Cy  and  Cz,  the  angles  to 
the  leader  in  the  horizontal  and  the  vertical  planes, 9  and  7  are  found  as  follows: 


9^{Cy-ty)S  (1) 

7  =  (Cz  -  tz)S  (2) 
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From  this,  the  distance  from  the  camera  to  the  target  in  the  x,y  plane  (pXy)  can  be  found: 


Pxy=P  cos(7)  (3) 

The  vector  d*  from  the  camera  to  the  target  is: 

pxy  cos (9) 

dc  =  Pxy  sin(0)  (4) 

psin(7) 

Finally,  the  vector  from  the  robot’s  wheels  to  the  target,  d*  is  found  using  the  static  vector 
from  the  wheels  to  the  camera,  d£: 

d*  =  dcr  +  d*  (5) 

This  vector  is  used  to  control  the  vehicle’s  motion  relative  to  the  target. 

6  Robot  Control 

As  discussed  in  Section  2,  there  are  a  variety  of  methods  for  control  in  a  leader/follower 
scenario.  For  simplicity’s  sake,  basic  PID  control  is  used  on  heading  and  velocity  in  this 
application.  The  ERl’s  low  level  controller  accepts  velocity  commands  in  terms  of  rotational 
and  translational  velocities  in  rad/sec  and  cm/sec  (u),  v ),  so  the  heading  and  distance 
controls  are  decoupled  and  treated  as  separate  control  loops. 

6.1  Heading  Control 

In  order  to  pursue  the  leader  directly,  the  heading  controller  attempts  to  fix  the  follower’s 
heading  directly  at  the  leader  by  forcing  the  angle  0  =  0.  The  PID  loop  shown  in  Figure 
5  changes  the  rotational  velocity  of  the  robot  based  upon  the  value  of  9  found  in  the 
previous  step.  KpuJl  Klul  and  represent  the  proportional,  integral  and  derivative  gains 
respectively  for  heading. 


w  =  KpJ  +  Kdw  ^  +  Kiul  J  9(t)dt  (6) 

However,  this  is  the  ideal  form  of  PID  control,  and  does  not  represent  a  practical  imple¬ 
mentation  in  code.  Therefore,  it  is  approximated  for  each  time  step  n  with  sampling  time 
Ts : 
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W  lead 


Figure  5:  PID  loop  for  heading  control. 


Proportional 


Derivative 


Integral 


Figure  6:  PID  loop  for  velocity  control. 


n 

W  =  D-p/jj @n  T  Kdco^An  T  D-iuj  ^  ^  @Cli  (7) 

i=  1 


where 


$  An 


6m 


%i  @n— 1 

Ts 

%  +  @i— 1 


(8) 

(9) 


6.2  Velocity  Control 

The  velocity  controller  aims  to  keep  the  follower  robot  at  a  fixed  distance  in  the  x,y  plane 
behind  the  leader  robot  while  matching  the  leader’s  velocity,  as  shown  in  Figure  6.  This 
means  minimizing  the  error  between  the  distance  to  the  leader  in  the  x,y  plane,  pxy,  and 
the  following  distance  Dxy  by  adjusting  the  robot’s  translational  velocity  v. 


Pxy  DXy 

(10) 

K-pv^n  ■^■dv^-An  Kiv  ^  ^ 

i=l 

(ii) 

where 
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CA  n 


(12) 


^n— 1 

Ts  . 

Ts 

Other  than  heading  and  velocity  control,  the  system  also  has  some  other  features.  As 
a  safety  catch  for  the  controller,  if  the  system  recognizes  the  object  within  a  pre-defined 
safety  distance,  it  sets  u  and  v  to  zero.  Also,  if  no  object  is  detected  within  a  certain  pre-set 
timeout  period,  the  robot  will  set  v  to  zero  and  c o  to  some  constant  C  so  that  it  slowly  spins 
trying  to  reacquire  the  leader.  Finally,  it  can  also  be  trained  on  other  leader  signal  objects 
to  inform  the  robot  that  it  should  halt  and  perform  no  further  recognition. 

7  Results 


em  = 


+  Cj— l 


Several  tests  were  undertaken  on  the  software  system  described,  divided  into  two  sections: 
Target  Recognition  and  Robot  Control. 

7.1  Target  Recognition 

Using  the  hardware  described  earlier,  these  tests  characterize  the  functionality  of  the  SIFT 
object  recognition  software.  The  robot  was  kept  static  while  various  properties  of  the  target 
were  changed,  such  as  distance,  orientation,  occlusion  and  position  in  the  robot  field  of  view. 
Each  of  the  data  points  presented  in  the  graphs  7-9  represent  100  target  recognitions  for 
each  change  in  target. 

The  “Linux  In  A  Nutshell”  book  pictured  in  Figure  2,  which  served  as  a  leader  target  for 
these  tests,  only  exemplifies  a  typical  leader  target.  Other  targets  with  more  or  less  texture 
would  cause  results  much  different  than  those  presented  here.  It  also  must  be  noted  that 
this  is  a  small  target  (15cm  by  10cm).  It  is  assumed  that  training  a  larger  target  would 
allow  the  image  recognition  system  to  work  at  longer  distances,  but  reduces  the  practicality 
for  the  leader  robot  or  human. 

7.1 .1  Distance  to  Target 

The  first  experiment  tested  the  effect  of  distance  to  target.  The  robot  camera  was  first 
placed  at  a  distance  of  25cm  from  the  target  to  train  the  image.  The  target  was  then 
moved  in  the  camera’s  x-axis  in  increments  of  50cm,  with  100  recognition  samples  collected 
at  each  point.  The  results  can  be  seen  in  Figure  7.  It  can  be  seen  in  the  graph  that  there  is 
a  sharp  dropoff  in  system  performance  at  distances  beyond  350cm.  Furthermore,  a  slight 
offset  is  found  in  the  distance  to  target  in  the  50  to  350cm  range.  It  can  be  reasonably 
assumed  that  this  is  due  to  error  in  finding  the  distance  from  the  focal  position  of  the 
camera  to  the  target  when  training  the  target  object.  A  small  error  at  this  stage  would 
create  much  larger  errors  in  the  recognition  stage,  when  the  target  is  much  further  away. 

Another  factor  which  affects  the  overall  performance  of  the  system  is  percent  recognition. 
If  the  object  is  not  recognized,  no  distance  and  position  to  the  target  is  returned  and  the 
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Mean  Error  and  Standard  Deviation  with  Distance 


Figure  7:  The  effect  of  distance  to  target  on  the  mean  and  standard  deviation  of  the  distance 
measured. 

system  cannot  function.  In  these  tests,  the  system  recognized  the  target  100%  of  the  time 
up  to  distances  of  4  meters  and  76%  recognition  at  4.5  meters.  However,  there  was  so  much 
error  in  the  distance  measurement  at  these  distances  that  the  results  would  be  unusable  by 
the  robot. 

7.1.2  Target  Position 

This  tests  the  calculation  of  target  position  in  the  robot’s  field  of  view,  at  a  distance  to 
target  of  200cm.  For  this  test,  the  object  was  moved  in  the  robot’s  yz  plane  (i.e.  left  to 
right  in  the  robot’s  field  of  view),  and  this  displacement  measured  by  the  object  recognition 
system.  Starting  with  0cm  displacement,  the  mean  error  and  standard  deviation  in  distance 
measurement  are  presented  in  Figure  8.  A  displacement  of  80cm  represents  the  edge  of  the 
camera’s  field  of  view.  From  this  graph  it  can  be  seen  that  there  is  a  slight  error  in 
the  position  calculated,  which  increases  with  displacement.  This  error  results  from  the 
approximation  used  to  calculate  the  angle  to  target  from  its  pixel  location,  in  radians  per 
pixel,  which  is  not  a  completely  accurate  means.  Despite  this,  the  error  is  only  an  average 
of  2.5cm  over  the  80cm  displacement,  and  did  not  noticeably  affect  system  performance  in 
following  the  leader  object. 


7.1.3  Target  Orientation 

Another  factor  which  could  potentially  affect  the  system  is  the  orientation  of  the  target. 
Because  of  the  mechanics  of  the  SIFT  algorithm,  rotation  in  the  robot’s  yz  plane  (about 
its  x-axis)  had  no  substantial  effect  on  on  any  parameter  as  compared  with  the  previous 
tests.  However,  as  common  sense  would  dictate,  the  angle  about  robot’s  y  or  z  axes  has 
substantial  effect,  due  to  the  occlusion  of  details  in  the  target.  The  system  performed 
comparably  with  previous  tests  up  to  an  angle  of  40  degrees  about  the  y  or  z  axes,  with  an 
extremely  sharp  dropoff  in  both  percent  recognition  and  accuracy  beyond  this  angle. 
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Mean  Error  and  Standard  Deviation  with  Horizontal  Displacement 


Horizontal  Displacement  To  Target  (cm) 


Figure  8:  The  effect  of  horizontal  displacement  of  target  on  the  mean  and  standard  deviation 
of  the  distance  measured. 


Standard  Deviation  of  Distance  with  Percent  Occlusion  of  Target 


Figure  9:  The  effect  of  target  occlusion  on  the  standard  deviation  of  the  distance  measured. 
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Heading  Error  to  Target 


Figure  10:  The  results  of  tuning  the  heading  control  PID  loop. 


7.1.4  Target  Occlusion 

The  final  parameter  of  the  object  recognition  system  tested  was  the  target  occlusion.  The 
distance  to  target  was  again  set  at  a  static  200cm  for  this  test.  The  results  of  occluding 
the  target  by  a  given  percentage  are  shown  in  Figure  9.  It  can  be  seen  that  the  system 
performs  very  well  up  to  an  occlusion  of  about  50%.  Once  more,  the  percent  recognition 
drops  off  quite  sharply  with  this  parameter.  At  200cnr,  there  was  100%  recognition  with 
70%  target  occlusion,  11%  recognition  at  80%  occlusion,  and  the  system  never  recognized 
the  target  with  90%  occlusion.  Once  more,  it  is  important  to  bear  in  mind  that  the  choice 
of  target  could  greatly  affect  this  result.  A  target  with  lots  of  detail  spread  all  across  it 
surface  would  be  easier  for  the  system  to  recognize  with  higher  occlusion  than  one  with 
large  amounts  of  white  space. 

7.2  Robot  Control 

The  second  set  of  tests  examined  the  tuning  and  performance  of  the  robots  rotational  and 
translational  velocity  control. 

7.2.1  Heading  PID 

The  robot’s  translational  velocity  was  disabled  for  this  test,  and  the  robot  was  oriented  at 
a  distance  of  200cm  in  its  x-axis  and  80cm  in  its  y-axis  from  the  target.  This  presented 
a  step  change  in  orientation  error  to  the  heading  PID  controller.  The  angle  sensed  by  the 
object  recognition  system  versus  sample  number  is  shown  in  Figure  10.  This  system  was 
intentionally  underdamped  for  better  performance  in  the  leader/follower  scenario  to  create 
more  responsive  behaviour  when  the  leader’s  position  is  constantly  changing.  Once  the 
leader  begins  turning,  it  is  likely  that  he  will  continue  turning. 

7.2.2  Velocity  PID 

For  this  test,  the  robot  was  once  more  positioned  200cm  from  the  static  target  object,  with 
the  distance  to  follow  the  leader  set  to  100cm.  This  represented  a  step  change  for  the 
Velocity  PID  controller.  Results  are  shown  in  Figure  11  as  the  robot  moves  itself  to  100cm 
away.  Once  again  this  system  was  intentionally  underdamped  to  provide  stronger  control 
with  changing  setpoints. 
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Distance  To  Target 


Figure  11:  The  results  of  tuning  the  position  control  PID  loop. 


Velocity  Matching 


Figure  12:  Following  distance  and  velocity  with  a  leader  robot  moving  at  30  cm/sec. 


With  the  robot  showing  good  distance  control  to  a  static  target,  the  system  was  tested 
with  a  moving  leader  robot.  The  follower  was  allowed  to  settle  at  its  following  distance 
of  150cm  behind  a  static  leader,  before  the  leader  accelerated  to  a  constant  velocity  of  30 
cm/sec.  As  is  shown  in  Figure  12,  the  follower  was  able  to  match  the  leader’s  velocity,  while 
mainting  a  fixed  following  distance  fairly  well.  The  follower’s  distance  to  leader(red)  and 
velocity(green)  are  shown  against  sample  number(0.5  seconds  each). 

7.3  System  Performance 

With  all  of  the  subsystems  functioning  well,  the  full  leader/follower  behaviour  was  tested 
in  an  office  environment.  A  number  of  qualitative  observations  were  made.  The  first  is  that 
the  system  had  trouble  with  tight  corners.  The  limited  field  of  view  of  the  camera  kept 
it  from  seeing  the  leader  if  the  turn  was  too  sharp.  Secondly,  the  heading  PID  controller 
pursued  the  leader  directly,  causing  it  to  bump  intervening  corners  if  the  leader  was  a  long 
distance  away. 

The  execution  time  for  the  system  was  found  to  be  about  500msec  on  the  hardware  indi¬ 
cated.  This  meant  that  the  PID  loops  were  only  being  updated  twice  per  second,  hampering 
the  speed  control  algorithm.  It  seemed  to  have  difficulty  estimating  the  leader’s  speed  under 
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acceleration  and  deceleration,  although  it  would  maintain  the  set  following  distance  quite 
well  under  steady-state  conditions.  Additionally,  the  limited  top  speed  of  the  platform 
meant  it  could  not  keep  up  to  a  human  walking  at  full  speed. 

A  third  limitation  of  the  system  was  the  problem  of  false  positive  recognition,  which  would 
cause  the  robot  to  chase  ghost  targets  if  the  actual  leader  was  not  in  its  field  of  view. 
Changing  the  positive  recognition  threshhold  of  the  image  recognition  software  would  solve 
this  problem. 

8  Conclusion 


Using  an  inexpensive,  low  resolution  camera,  and  a  simple  robot  platform,  leader/follower 
behaviour  has  been  created  using  the  SIFT  algorithm  for  object  recognition,  and  PID  control 
of  rotational  and  translational  velocity.  The  system  was  found  to  be  effective  recognizing 
and  accurately  positioning  everyday  objects  up  to  a  distance  of  3.5  meters.  Additionally, 
it  was  able  to  do  so  despite  changes  in  orientation  of  the  target  object,  and  occlusion  by 
intervening  objects.  Using  this  ability,  it  successful  followed  both  human  and  robot  leaders 
carrying  a  fiducial  object  in  an  office  environment  at  moderate  human  walking  speeds. 

A  number  of  limitations  to  this  system  were  found.  The  3.5  meter  effective  distance  for 
the  recognition  system  is  adequate  for  a  small  robots  operating  indoors,  but  would  not  be 
adequate  for  larger  outdoor  platforms.  Additionally,  direct  pursuit  of  the  leader’s  current 
position  is  quite  rudimentary  and  does  not  work  well  in  complex  environments.  PID  control 
loops  were  time  consuming  to  properly  tune,  and  the  performance  of  the  simple  robot 
platform  limits  the  applicability  of  the  system  as  implemented  to  wider  applications. 

Despite  these  shortcomings,  the  benefits  of  using  an  object  recognition  technique  for  this 
application  were  immediately  apparent.  Firstly,  the  immunity  to  orientation  and  occlusion 
problems  made  the  system  easy  to  use.  Secondly,  although  it  was  still  beneficial  to  use 
a  fiducial  on  a  human  leader  rather  than  direct  recognition,  it  was  possible  to  use  ad-hoc 
fiducials  chosen  at  application  time  rather  than  during  development.  Object  recognition 
also  allows  for  the  implementation  of  a  wide  variety  of  different  behaviours  based  upon  a  set 
of  different  trained  objects,  opening  the  way  for  new  avenues  of  human  robot  cooperation. 
And  despite  the  image  recognition  software  iterating  at  only  2Hz,  the  controllers  were  able 
to  perform  adequately  for  the  task.  With  the  application  of  increased  processing  power  to 
the  problem,  robot  control  performance  would  improve. 

9  Future  Work 


Given  the  simplicity  of  this  system,  a  large  number  of  improvements  and  extensions  are 
available.  In  regards  to  the  issue  of  object  recognition,  a  number  of  solutions  could  provide 
improvements.  Most  easily,  training  on  a  larger  target  object  from  further  away  would 
immediately  improve  the  ability  to  recognize  the  target  at  a  farther  distance  and  provide 
more  accurate  target  localization.  Secondly  the  implementation  of  higher  resolution  camera 
could  also  provide  similar  significant  benefits.  If  this  was  still  inadequate,  a  zoom  lens 
controllable  by  the  robot  system  would  alleviate  the  issue  further.  Training  the  follower 
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robot  on  multiple  sides  of  a  3D  object  rather  than  on  one  side  of  a  2D  object  would  also 
increase  robustness. 

In  order  to  control  path  following  performance  beyond  that  enabled  by  direct  PID  pursuit, 
orientation  cues  using  the  bounding  box  from  the  object  recognition  software  could  be  used. 
With  this,  more  complex  methods  of  control  such  as  Bezier  curves  could  be  used.  If  it  were 
desirable  to  track  the  complete  path  of  a  leader  very  exactly,  global  localization  would  be 
required.  If  this  were  in  place,  a  pan/tilt  unit  could  be  added  to  the  system  to  allow  the 
vision  system  to  track  the  leader’s  position  while  the  robot  followed  it  using  the  Pure  Pursuit 
path  tracking  algorithm.  The  addition  of  more  processing  power  to  the  problem  would  also 
allow  more  accurate  tracking  of  the  leader  by  reducing  recognition  iteration  times. 

Finally,  the  potential  applications  of  this  type  of  system  need  to  be  explored,  such  as  a 
military  convoying  system.  This  will  require  adaptation  to  an  Ackerman  steered  vehicle, 
and  testing  of  the  algorithm  in  less  controlled  outdoor  lighting  conditions.  Further  system 
development  would  also  see  the  introduction  of  a  variety  of  robot  behaviours  through  the 
recognition  of  a  variety  of  fiducial  objects,  leading  to  much  easier  cooperation  between 
human  and  machine. 
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